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MATHEMATICS AND STATISTICS, WITH AN ELEMENTARY AC- 
COUNT OP THE CORRELATION COEFFICIENT AND 
THE CORRELATION RATIO. 1 

By EDWARD V. HUNTINGTON, Harvard University. 

1. Introduction. — The first president of the Association, Professor E. R. 
Hedrick, in his retiring address in 1917, dwelt at length on the important position 
which this Association should take in relation to the large and growing field of 
applied mathematics. The Association should accept as perhaps its primary 
obligation the duty of interpreting the results of pure mathematics to the workers 
in the field of applied mathematics. This does not mean the "degradation of 
pure mathematics to utilitarian purposes." It means rather the search for 
identity of essential form among apparently diverse problems. When once the 
essential form of a problem has been recognized, the marvelously compact and 
sure analysis of formal mathematics either supplies directly the solution or 
illuminates the nature of the difficulty. This search for identity of form among 
the diversities of practical problems is then the task of the interpreter — a task 
which demands on the one hand a quick sympathy with the needs of the practical 
sciences, and on the other hand an unswerving loyalty to the rigorous ideals 
of the purely formal doctrines. The experience of the war has only served to 
emphasize the ever-growing need of such interpreters and the importance of the 
work of codification of problems which only they can perform. 

2. The Importance of Mathematics in Modern Statistics. " Biometrika." — 
With this brief word of introduction, I desire to bring to the attention of the 
Association the opportunities for such interpretative service presented in a com- 
paratively new field of mathematics, namely, the field of mathematical statistics. 

Mathematicians as such seem to me to have been slow to enter this field. 
Of the professional mathematicians in this country only about a dozen have 
thought it worth while to join the American Statistical Association (one of the 
oldest learned societies in the United States, founded in 1839 and now having 
over 800 members). Of the published papers read before the American Mathe- 
matical Society during the last five years, only three or four have had any relation 
to statistics. The very terminology of modern statistical method is unfamiliar 
to the great majority of professional mathematicians. Few of us, I fancy, have 
ever heard of the "tetrachoric functions," "homoscedastic linear regressions," or 
"mesokurtic skew distributions." Most of the development of the science has 
been left to the economists, the actuaries, the biologists, the psychologists, and, 
more recently, the pedagogues. The result has been a wide scattering of the 

1 Retiring Address of the President of the Mathematical Association of America, read at the 
summer meeting, Ann Arbor, Michigan, at a joint session of the Mathematical Association of 
America, the American Mathematical Society, and the American Astronomical Society, Sept. 4, 
1919. Complete references to papers cited only by title will be found in § 14. 
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literature of statistical theory; many theoretical results have been first developed 
in articles having miscellaneous titles like "Family likeness in stature," "The 
trend of the stock market," or "The reliability of spelling scales"; any unification 
of effort was clearly lacking. 

A distinct epoch in the development of mathematical statistics was marked 
by the founding of Biometrika in 1901, by Karl Pearson, W. F. R. Weldon, and 
Francis Galton, for the express purpose of bringing together the biologist and 
mathematician into a partnership of mutual helpfulness. This journal has 
become a veritable storehouse of rapidly advancing statistical theory, not only of 
interest to the student of evolution, but also of fundamental importance to the 
statistician in every field of science; and it needs only a cursory glance through 
its pages to show what an essential part mathematics has played in this develop- 
ment. 

Among the mathematical topics which have proved useful in statistics may 
be mentioned the following, selected quite at random: the theory of probability 
in all its phases; determinants; conic sections and quadric surfaces (especially 
conjugate diameters); hypergeometric series; the Gamma function; Bernoulli's 
numbers; Stirling's Theorem for factorials; all kinds of interpolation- and 
quadrature-formulas; hyperbolic functions and properties of the catenary; rever- 
sion of series; differential equations of various types; and multiple integrals in 
ra-dimensional space. Such mathematical subjects as these are part of the every- 
day equipment of the biometricians of the Pearson school. Any adventuresome 
biologist who tries to apply some technical statistical method without an adequate 
knowledge of its mathematical foundation is likely to call down on his head the 
righteous indignation of a ready and vigorous Pearsonian pen, which will ruth- 
lessly expose his ignorance! 

And yet the cry is always for more mathematics, and ever more. Many a 
vital problem in heredity is still unsolved solely because the difficulties of the 
mathematical analysis have not yet been surmounted. In a recent number, 
Pearson wrote: 

It is greatly to be desired that the "trigonometry" of higher dimensional plane space should 
be fully worked out, for all our relations between multiple correlation and partial correlation 
coefficients of n variates are properties of the "angles," "edges," and "perpendiculars" of sphero- 
polyhedra in multiple space. It would be a fine task for an adequately equipped pure mathe- 
matician to write a treatise on "spherical polyhedrometry"; he need not fear that his results 
would be without practical application, for they embrace the whole range of problems from 
anatomy to medicine, and from medicine to sociology and ultimately to the doctrine of evolution. 
{Biometrika, 11, 1916, p. 237.) 

Who shall say that modern statistics is not a worthy field for mathematical 
endeavor? 

In the field of statistical method and theory, the most characteristic single 
problem is the problem of correlation. The establishment of the existence or 
non-existence of correlation between two things is the final goal of most statistical 
work. Of the several mathematical measures of correlation which have been 
proposed, the correlation coefficient and the correlation ratio are perhaps the most 
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fundamental. It may not be inappropriate, therefore, to devote the remainder 
of this address to an elementary exposition of the meaning of these two quantities, 
not with the thought of adding anything essentially new to the theory, but in the 
hope of providing a convenient starting point from which some of us who are not 
already familiar with the subject of statistics may begin our study of its modern 
developments. 

3. The Central Problem of Statistics. Correlation between Two Functions, 
*(*)> y(i)' — I Q the problem of correlation, what is sought for is some measure of 
agreement or disagreement between two series of paired quantities, Xi, x 2 , x 3 , • • • x n 
and y\, y 2 , y%, • • • y n , where the pair Xi, y\ are supposed 
to belong to an "individual" numbered 1, the pair x 2 , 
y 2 to an "individual" numbered 2, and so on; the total 
number of individuals, n, being called the total "popu- 
lation." 

For example, x and y may be the height and weight 
of individual men; or x may be the rainfall and y the 
price of wheat for individual years; and so on. 

In order to state the problem geometrically, we may 
plot the x's and y's as ordinates against the individual- 
numbers i as abscissas, as in Fig. 1; what we seek is 
then some measure of agreement or disagreement be- 
tween the two curves or functions 
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Fig. 1. x = age, y = 
income, of a certain group 
of men (n = 20). Means: 
x = 30 years, y = 2600 dol- 
lars. Standard deviations: 
<r x = 4.025 years, a v = 417.1 
dollars. Correlation coeffi- 
cient: r = .855. 



over a range of values a^=i^b (the function being here 
defined for only a finite number of values of i). 

4. Notation: x, y; £, rj; X, Y. — In order to reduce 
these two curves to a comparable basis, it is convenient 

to take two preliminary steps, one concerning the base-line, or origin, the other 
concerning the scale, or unit of measurement. 

As to the base-line, we agree to refer each of the given series to the (arith- 
metical) mean of that series as origin, the means being given by 



* = - Z (<*)» 



V = " Z (Mi)- 



In other words, instead of plotting the given x's and y's, we plot the "deviations 
from the means," £ and -q, where 

& = Xi— x, r\i = y t - y. 

As to the scale, we agree to take as the unit of measurement for each series 
the "standard deviation" of that series, namely 



<r*=y]-H (tf)> «v = \- E (Vi 2 ), 



424 



MATHEMATICS AND STATISTICS. 



[Dec, 



where the "standard deviation," tr x (or <r„), may be interpreted as the radius of 
gyration of the z-curve (or y-curve) about a horizontal axis through its mean 
height. In other words, instead of plotting the x's and y's or the £'s and 77's in 
their original units (kilograms, dollars, degrees, or what not), we plot the 

"ratios" X and F (as in Fig. 2), where 

Xi — !;i/(T x , Yi = 1)il<Ty. 

These X's and Y's will be pure numbers, independ- 
ent of the original units, and having the following 
properties: 




n 



n 



Z (*.) = 0, - £ (Yd = 0; 

fh 

E (xr> = 1, I z 07) = 1. 



That is, the arithmetical mean of X and the arithmetical 
mean of Y are both zero, while their standard deviations (or radii of gyration) 
are both unity. 

For the purpose of studying correlation, the reduced curves formed by plotting 
X,- and Yi against i will be more convenient than the original curves formed by 
plotting xt and y» against i; and any criterion of agreement or disagreement 
between the X's and F's may be accepted as a valid criterion of agreement or 
disagreement between the x's and y's. 

Note. — The term "standard deviation" was introduced by Pearson in 1894. 

It should be noted that in case x (or y) is a constant, X (or Y) will be inde- 
terminate. This case, which presents no special interest, will be excluded from 
consideration in what follows. 

5. The Correlation Coefficient r. Definition and Properties.— The most 
widely accepted measure of correlation between two given series or curves x 
and y, is the Pearsonian (or "product-moment") coefficient of correlation, r, which 
may be defined in terms of the reduced quantities X and F by the simple formula 



r = -JKXiYi). 



That is, r is the mean product of corresponding pairs of values of X and F. 
By aid of the simple transformations 



1 1 



r= 1-6 -£[(*<- Yi)'], 



2 n 



r= -i+HZKXi+YiY], 



it is easily shown that r cannot exceed 1 in absolute value. The case r = + 1, 
called the case of perfect positive correlation, will occur when and only when the 
X and F curves coincide. The case r = — 1, called perfect negative correlation, 
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will occur when and only when the Y curve is the reflection of the X curve in the 
axis of i. Either case will occur when and only when the ?;'s are directly pro- 
portional to the £'s; or, what amounts to the same thing, when and only when 
the original y's and a;'s are connected by a linear equation 

Vi— V = Xj- x 

<Ty ff X 

We see, therefore, that in the Pearsonian sense, perfect correlation (positive or 
negative) between two sets of quantities x and y means nothing more nor less than 
the existence of a linear algebraic equation connecting those quantities. Indeed, a 
better name for the coefBcient of correlation might be the "coefficient of linear 
relationship." 

In general, the given sets of values will not be linearly related, and the value 
of r will be less than 1. 

6. Relation of r to the Method of Least Squares. — The significance of the 
coefficient of correlation, r, may be brought out further by the following con- 
siderations. In the expression 

the quantity 

may be regarded as a measure of the total discrepancy, in the sense of the method 
of least squares, between the curves X and Y. Clearly, when the discrepancy A 
is zero, or the curves coincide, the correlation r is perfect (r = 1); and as the 
discrepancy increases the correlation decreases. Hence r is seen to be a suitable 
measure of the degree of approach to coincidence of the two curves X and Y. 

We note that as r varies from + 1 to — 1, A will vary from to 2, the value 
r = corresponding to A = 1. 

A further connection with the method of least squares will be noted below (§ 9). 

7. Equivalent Formulas for r. The Case of a Continuous Variable. — Equiva- 
lent and more familiar formulas for r are 

r = , or r = 



™*°v' VZ(? 2 )a/E('? 2 )' 

The importance of the product-factor 2 (£77) was evident in the work of 
Bravais in 1846, and was re-discovered and applied by Galton in 1886-1888; 
but the complete expression for r, in the form just stated, was first given by 
Pearson in 1896. [Compare also L. March (1905).] 

For purposes of numerical computation, the following formula is to be pre- 
ferred in practice (J. A. Harris, Amer. Naturalist, Vol. 44, p. 693-699, 1910; 
L. L. Thurstone, Psychological Bulletin, Vol. 14, pp. 28-32, 1917) : 
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' X) (xy) - xy 



r = 



where, in the denominator, 



and 



a ' = \^ 



(2/ 2 ) - f 



are the standard deviations of the x's and t/'s. 

We note in passing that if the n individual-numbers, i, are replaced by a 
continuous variable, t, the formulas above become modified as follows (the 
summations being replaced by integrations) : Given x = f(t) and y = (p(t) for 
the range a SH t 3= b. Let 



b 
and 



1 r b , 

I xdt, 

1 r 6 

- aj a 



i; 



v = y - y; 



x== f> 

Ox 



Y = 



Then 



1 r b 

> = i 

b — aj a 



XYdt = 



i r b 

— f FA = 0; j— 1 — f 1 X 2 cft = 1, — !— f 7 2 cft = 1'; 

— a J a b — aj a b — aj a 

8. The Correlation Graph, or Scatter-Diagram. — Up to this time we have 
thought of the values of x% and ?/,• as functions of i, and have plotted them as 
two separate curves on the axis of i as a base. It is often more convenient to 
think of y as a function of x, and to plot the graph of y 
against x in the ordinary way. The result will not, how- 
ever, be an ordinary graph, since y will not, in general, 
be a single-valued function of x. To any value of x may 
correspond many values of y, any one of which may be 
repeated more than once. A typical graph of y as a func- 
tion of x will therefore appear as in Fig. 3, called a corre- 
lation graph (Galton, 1888), in which every dot (x, y) 
represents a pair of values of x and y belonging to some 
individual, and the total number of dots is equal to the 
(Multiple dots would be indicated in the figure by 
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Fig. 3. Correlation 
graph. 
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In practice, such a graph is usually divided into squares of suitable size, 
and the number of dots falling within each square is indicated by a numeral 
written within that square. This gives a "correlation table" for x and y. Or, 
each such numeral may be replaced by a z-ordinate of corresponding height, 
erected at the middle of the square, and a sort of tent-cloth spread over the tops 
of these ordinates. This gives a "correlation surface," the height of which at 
any point of the x, y plane shows the density of the distribution of the dots at 
that point. For our present purpose, however, the simple correlation graph or 
"scatter-diagram," is all that we shall need. 

9. Regression Lines and Coefficients of Regression. — Let us now seek a 
linear expression that shall give us most accurately the value of y corresponding 
to any given x. 

In the special case when r = 1 we know (§5) that y is simply a linear function 
of x, and the equation 

y — y = — (x — x) or Y = X 

will give y exactly in terms of x (or Y in terms of X). In this special case all the 
dots in the correlation graph will lie on this line. In the general case (r not equal 
to 1), we may write an approximate equation 

y'-y = \^{x-x) or Y' = \X 

and seek to determine the arbitrary factor X so that the values of y' (or Y') 
obtained from this equation shall be equal as nearly as possible to the true values 
of y (or Y). That is, we seek to determine X so that the "least-squares" total 
error, 

e v 2 = lZKy' - V) 2 \ or E v * = e -f- 2 =lZl(Y'-m 

lb O y IV 

shall be a minimum. An easy transformation (using the simpler notation) gives 

E 2 = 1 - r 2 + (X - r)\ 

which will clearly be a minimum when X = r. Hence the "best" straight line 
to give y in terms of x will be 

y' — y = r—(x — x) or Y' = rX, 

while the amount of total error involved in using y' for y (or Y' for Y) is 

e 2 = <r v 2 (l - r 2 ) or E y 2 = 1 - r 2 . 

Clearly, if r = 1, this error is zero; and if r = 0, the error takes its maximum 
value. Hence again we see that the value of r is a suitable criterion of the 
approach to linear relationship of the variables x and y. 
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The straight line just obtained is called the line of regression of y on x, and the 
factor r{o- y jo- x ) is called the coefficient of regression of y on x. 
Similarly, the straight line 



x' — x = r~r(y — y) 



or 



X' = rY 



will be the "best" straight line for giving x in terms of y, the "total error" being 

ei = (r,»(l - r 2 ) or E x 2 = 1 - r\ 

This line is called the line of regression of x on y, and the factor r{a x ja y ) is called 
the coefficient of regression of x on y. 

The two lines of regression will not coincide unless r = ± 1. 

It will be observed that the coefficient of correlation, r, is the geometrical mean 
of the two coefficients of regression. This fact is the starting point for the theory 
of "partial correlation coefficients" for any number of variables (Yule, 1897 
and 1907), into which we cannot enter here. 1 

io. Curves of the Means, or Regression Curves. — The correlation graph, 
with its scattered dots, may, for some purposes, be simplified as follows. 

Thinking of y as a function of x, let us replace each column by a representative 
dot located at the mean of that column. We thus obtain what is called the curve 
of the means of the columns, or the regression curve of y on x (Pearson, 1896) . 

Or again, thinking of a; as a function of y, we may replace each row by a 
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Fig. 4. Curve of the means of the columns, 
and line of regression of y on x. Coefficient of 
regression (2/ on x) = .886; correlation ratio (y 
on x) = ii V x = .941. 
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Fig. 5. Curve of the means of the rows, 
and line of regression of x on y. Coefficient of 
regression (x on y) = .825; correlation ratio (x 
on y) = rixy = .916. 



representative dot located at the mean of that row, thus obtaining the curve of 
the means of the rows or the regression curve of x on y. (See Figures 4 and 5). 

If the curves of the means happen to be straight lines, we have the case of 
linear regression, which was the earliest, and for a time the only case considered 
(Galton, 1888; Pearson, 1895; Pearson, 1905). In this case it can be readily 
shown that the curves of the means, or curves of regression, will coincide with the 
lines of regression defined in § 9. 

1 An excellent example of the application of partial correlation coefficients in practical statistics 
may be found in W. F. Ogburn's "Analysis of the standards of living in the District of Columbia 
in 1916:" Quart. Pub. Amer. Statistical Association, vol. 16, June, 1919, pp. 374-389. 
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In order to prove this most clearly, for the case, say, of the column-means, 
let us adopt the following notation, expressed, for convenience, in terms of 
XandF(§4): 

X = the abscissa of any column 

n x = the number of dots in the column 

Y x = the ordinate of any dot in the column 

Y x = the mean of the column; that is, 

Y X = ±-S X (Y X ), 
n x 

where S x denotes summation from dot to dot within the column (X = constant) . 
These Y x 's are the ordinates of the curve of the means in question. If this 
curve is a straight line, then we must have, 

Y x = M + XX, 

where n and X are some constants. But, summing from dot to dot over the 
whole table, 

1 E(^)=;EM+;E(XD 



n 



- n Z(Y) = , + ^ n Z(X), 



whence 



H = 0. 
Again, multiplying each side by X, and summing as before, 

n n 

or 2 

JZ(X7) = X(1), 

whence 

X= r. 

Hence, if the curve is a straight line at all, it must be the line 

Y x =rX 

which is precisely the regression line of y on x. 

1 Thus, if Sa denotes summation from column to column, while S x denotes summation from 
•dot to dot within the column X, then 



2 Thus, 



1?{Y X ) = ls A [S x (Y x )] = ls A [S x (Y)] = - 2(7) = 0. 

lb ih "b "b 



\ ZCXTx) = I S A [S X (XY X )] = I Sa[X8 x (Y x )] = \ S A [XS X (Y)] = \ S A [S X (XY)] +i 2(17) =r. 

lb lb lb lb lb lb 
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11. The Correlation Ratio, % x . Definition and Properties. — Whether the 
curve of the means of the columns is linear or not, the whole system of dots in a 
correlation graph may be regarded as forming a band of more or less irregular 
width along this curve; and the narrower this band, the more nearly will y be a 
single-valued function of x. 

On the other hand, the dots may be regarded as forming a band along the 
curve of the means of the rows; and the narrower this band, the more nearly 
will a; be a single-valued function of y. 

Considerations of this sort led Pearson in 1905 to introduce two new measures 
of correlation, %» and t\ xv , called the correlation ratio of y onx and the correlation 
ratio of x on y respectively. If one speaks simply of the correlation ratio, r\, one 
usually means the correlation ratio of y on x. The definition and properties of 
this quantity may be explained as follows. 

In order to secure a measure of the width of the band of dots about the curve 
of the means of the columns, we begin by letting 

d x = the standard deviation of a column, or the radius of gyration of the 
column about its mean; that is, 

d x = yj±8 x (Y x -Y x )*, 

where S x denotes summation from dot to dot within the column (§ 10). These 
d x '$ will measure the "scatter" of the dots in each column about the mean of 
that column. 

What we then need, as a measure of the total "scatter" of the dots about the 
curve of the means, is some kind of average of the d x 's for all the columns. 

The most obvious average to take for this purpose would be either the mean 
of the d x s, namely, 

D' = jS A (d x ), 

where Sa denotes summation from column to column, and A is the number of 
columns, or else the standard deviation of the d x 's, namely, 



D" = yJjS A (d x *). 



It is found more convenient, however, to take a sort of pseudo-standard deviation, 
namely, 

in which each d x 2 is "weighted" by the number of dots in that column, and the 
sum divided, consequently, by the total number of dots, n, instead of by the 
number of columns, A. 1 

1 Here D' is the mean height and D" the radius of gyration of the "scedastic curve" formed 
by plotting d x against X. Both these averages were considered by Pearson (1905, p. 10), but 
immediately abandoned in favor of D. 
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This quantity D may be transformed without difficulty as follows: 1 

D* = ls A [S x (Y x - Y x f\ = 1 - \s A {n x Y x 2 ). 
The correlation ratio of y on x, is then defined as 

Vvx = \-S A {n x Y x 2 ) 

where Sa denotes, as before, summation from column to column. 

This quantity rj yx clearly lies between and 1 ; and the relation r\ yx 2 = 1 — D 2 
shows that the case r\ vx = 1 will occur when and only when the band of dots narrows 
down into coincidence with the curve of the means of the columns; that is, when and 
only when y is a single-valued function of x. The narrower the band, the more 
nearly will r\ vx approach 1 . 

In terms of the original variables, x and y, the definition becomes 






Vyx = 



^W- 



f 



which is the most convenient form for computation. 

The correlation ratio of x on y, namely -q xy , is obtained by simply interchanging 
x and y and replacing Sa by Sb (to denote summation from row to row). 

It should be especially noted that the significance of the two correlation ratios 
is concerned with the single-valuedness of the connection between x and y, 
rather than with any particular form of connecting equation. Indeed, they 
might well be called the " coefficients of single-valued relationship." 

For the case of more than two variables, definitions of "partial" and 
"multiple" correlation ratios have been given by Pearson (1915). 

12. Relation between t] vx and r. — An important relation between the correla- 
tion ratio, 7) yx , and the correlation coefficient, r, may be brought out by a further 
study of the curve of the means of the columns (the regression curve of y on x). 

If the curve of the means happens to be a straight line, we have the case of 
linear regression (§ 10). 



'Thus: 



D'=- n S A lS x (Yx-Y x ) 2 ] 



= I S A [S X (Y X 2 )} - 2 \ S A [Y X S X (Y X )} + \ S A [S X (Y X 2 )} 
= I S(7 2 ) - 2 i ^(Fx^F*) + 1 S A (n x Y x *) 
= 1 -ls A (n x Y x 2 ). 
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If the curve of the means is not a straight line, we may seek the straight line 
which fits the curve with a minimum total discrepancy. 

Let Y' = n + \X be the required straight line, where ju and X are to be deter- 
mined so as to make the total discrepancy between the line and the curve a 
minimum. 

As the expression for the total discrepancy, it would be most natural to take 
the "least-squares" error 

jS A [(ix + \x-Y x n 

where S A denotes summation from column to column, and A = the number of 
columns. It proves more convenient, however, to take a "pseudo-least-squares" 
expression 

E* = ls A [n x (vL + \X- Y x y], 

IV 

in which the square of the difference in each column is "weighted" by the number 
of dots in that column before summing. This quantity E 2 may be transformed, 
by a straightforward process, 1 into 

W = ifc,» - r 2 + M 2 + (X - r) 2 , 

where % x has the value defined above (§ 11). 

Evidently, to make E a minimum we must put jx = and X = r. Hence the 
line which best fits the curve of the means of the columns (in the pseudo-least- 
squares sense) is simply the line of regression (§ 9) : 

Y' = rX or y' — y = r — (x — x). 

The total discrepancy between the line and the curve is 

E 2 = rj yx 2 — r 2 or e 2 = <t^{t\ v i — r 2 ). 

From this last equation we see that the correlation ratio, r\ yx , will be equal to the 
correlation coefficient, r, when and only when the regression of y on x is linear; 
that is, when and only when the curve of the means of the columns reduces to 
a straight line. 
1 Noting that 

\s A {n x X) =^S(X) =0, 

- n SA(nxX^) =is(X*) =1, 

\sA(n x Y x ) =\S A [S X (Y X )] = 1 2(F) = 0, 
and 

- S A (Xn x Y x ) = 1 8 A [XS X (Y X )] = \ S A [S X (XY X ) = \ S(X7) = r. 

Tii lb 7b fti 
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Similarly, y\ xv = r when and only when the regression of x on y is linear. 

Thus, in the case of linear regression, it makes no difference whether we use 
the correlation ratio or the correlation coefficient. In the case of non-linear 
regression, the correlation ratio is usually to be preferred. 

It should be noted, in this connection, that one of the curves of the means 
may be a straight line and the other not, so that we may have " linear regres- 
sion " for y on x and not for x on y. 

13. Conclusion. Remarks on the Probable Error. — We have thus completed 
our sketch of the elementary theory of the correlation coefficient and the correla- 
tion ratio. Space does not permit any account of other methods of measuring 
correlation, such as the method of contingency, the method of correlation by 
ranks (including Spearman's "Foot-Rule"), the method of four-fold division, 
etc., which have been devised for special purposes (compare Pearson, 1907; 
Ritchie-Scott, 1918). Nor can we discuss the precautions that are necessary 
in practical computation on account of "corrections for grouping" in the 
correlation table. 

At least a word must, however, be said in regard to the question of probable 
errors. The probable error of the correlation coefficient, r, is usually given as 

1 - r 2 
.67449 — pr- 
Vn 

and that of the correlation ratio, r\ yx , as roughly, 

1 — Vyx 2 



.67449 : 



v« 



while the probable error of the quantity f = T) yx 2 — r 2 (used as a test for linearity 
of regression) is, still more roughly, 

( 2 f> 



.67449 



and we are continually warned that no confidence can be placed in any of these 
quantities unless their value is three or four times their -probable error. 

Now what is the real significance of these probable errors? They have 
required some extraordinarily intricate mathematics for their determination; 
it is hardly possible to explain their essential nature in a few words. This much, 
can, however, be said. The question is primarily a question of what is technically 
known as the "errors of random sampling." Unless the statistical material 
with which we are dealing is a "sample" of a larger statistical "population," no 
question of "probable error" will arise. 

The simplest case may perhaps be stated as follows: Suppose a correlation 
table exists, with a population N, and an unknown correlation coefficient, r; 
and suppose a sample population, n, drawn from N, is found to have a correlation 
coefficient r%. Another sample of the same size would doubtless have a different 
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coefficient, r 2 . Suppose the total number of samples of this size that can be 
drawn from N is m, and let 

ri, r 2 , r a , •■• r m 

be the correlation coefficients for these m samples. Further, let r be the mean 
of these r's, which we may take to be equal to the true value r, and let p be their 
standard deviation: 



\ m 



m 

Then if the r's are assumed to be distributed according to the normal law of error, 
it can be shown that half of them will lie between r + 0.6745p and r — 0.6745p, 
and the other half outside those limits. When we say, then, that an observed 
value, rk, has a "probable error" of p = .6745p, or that the true value of r is 
as likely to lie within as without the limits 

r k + p and r k — p, 

we mean simply that if we actually tested all the mathematically possible samples, 
half the results would lie within these limits and half without; the important point 
being that we do not know to which half our particular sample may belong. Any 
statement as to the size of a probable error is always a statement of our ignorance; 
but even this ignorance may give wise information, for while all ignorance is 
deplorable, some ignorance is more deplorable than others. As a general rule, 
the smaller the size of the sample, the greater the depth of our ignorance. 
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MEETING OF THE MINNESOTA SECTION. 

The annual spring meeting of the Minnesota Section was held on Saturday, 
May 31, 1919, at Science Hall, Hamline University. There were thirty-one 
present including the following members of the Association: R. M. Barton, W. O. 
Beal, Edla G. Berger, W. H. Bussey, H. H. Dalaker, C. H. Gingrich, R. A. 
Johnson, Sister Mary John (institutional representative), R. M. Mathews, 

C. A. V. Peterson, Jessie G. Quigley, W. D. Reeve, Ella A. M. Thorp, C. H. 
Yeaton. 

The program consisted of chairman's remarks by C. H. Gingrich, Carleton 
College, a paper by W. P. Swann, of the department of physics, University of 
Minnesota, a book review by W. O. Beal, of the department of astronomy, 
University of Minnesota, a memorial tribute to Father William Earnshaw Etzel 
by Father Moynihan, President of St. Thomas College, a discussion of college 
entrance requirements for mathematics opened by W. D. Reeve, College 
of Education, University of Minnesota, popular talks on mathematics by Jessie 
G. Quigley, College of St. Teresa, and five papers upon mathematical -subjects 
by R. M. Mathews, Duluth Central High School, W. H. Kirchner of the college 
of engineering, University of Minnesota, R. A. Johnson, Hamline University, 

D. C. Kazarinoff, Carleton College, and W. H. Bussey, University of Minnesota. 
The program was unusually well selected and presented and all papers proved 

of interest to the audience. Mr. Reeve, in opening the discussion of college 
entrance requirements in mathematics, urged the use of mental tests in addition 
to the method of certification and presented a considerable body of data which 
has grown out of his experience in this work. 

Mr. Mathews's paper on "The graphics of intersections of line and conic" 
appears elsewhere in this issue of the Monthly. 

Mr. Kirchner followed his discussion of last year upon the desirability of 
introducing Descriptive Geometry as a course in college mathematics, with a 
short paper upon "The co-point of three planes" in which he set up the conditions 
for the common point of three planes from the point of view of descriptive 
geometry and interpreted these in terms of the corresponding conditions from the 
view points of analytic geometry and algebra. 

Dr. Johnson's paper dealt with the well-known theorem usually ascribed to 



